Smart Cities and Homes by Mohammad S Obaidat & Petros Nicopolitidis

Smart Cities and Homes by Mohammad S Obaidat & Petros Nicopolitidis

Author:Mohammad S Obaidat & Petros Nicopolitidis
Language: eng
Format: epub
ISBN: 9780128034637
Publisher: Elsevier Science
Published: 2016-05-04T16:00:00+00:00


Figure 10.5 Computing Optimal Policy for the Two-Smart Meter Examples

As is discussed previously, the decision maker aims to choose the action with the maximum discounted expected reward. Assume the discount factor is 0.9 and we look three steps forward. At the current time slot t, there are two available actions. For each action taken at time slot t, there are two future available actions at time slot t+1. Thus, there are four available actions at time slot t+1 in total. Similarly, there are 8 available actions at time slot t+2. Computing the optimal expected discounted reward is actually computing the optimal path from the root to a leave of the tree. It can be solved using dynamic programming, which is a standard algorithm to solve this problem. , and R(s1,a1) are all calculated in this approach.

Using the standard solver of POMDP [29], a policy transfer graph can be computed as shown in Fig. 10.6. Initially, there is no hacked smart meter. Thus, we start from node e0 and take the corresponding action a0. If the obtained observation is o0, we remain taking action a0. If the obtained observation is o1, o2 or o3, we transfer to node e1 and take the corresponding action a1. After taking a1, o0 is the only possible observation. Thus, the system returns to node e0.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.